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Abstract — Vector  sensor  imaging  presents  a  challeng¬ 
ing  problem  in  covariance  estimation  when  allowing 
arbitrarily  polarized  sources.  We  propose  a  Stokes 
parameter  representation  of  the  source  covariance  ma¬ 
trix  which  is  both  qualitatively  and  computationally 
convenient.  Using  this  formulation,  we  adapt  the  prox¬ 
imal  gradient  and  expectation  maximization  (EM)  al¬ 
gorithms  and  apply  them  in  multiple  variants  to  the 
maximum  likelihood  and  least  squares  problems.  We 
also  show  how  EM  can  be  cast  as  gradient  descent  on 
the  Riemannian  manifold  of  positive  definite  matrices, 
enabling  a  new  accelerated  EM  algorithm.  Finally,  we 
demonstrate  the  benefits  of  the  proximal  gradient  ap¬ 
proach  through  comparison  of  convergence  results  from 
simulated  data. 

I.  Introduction 
A.  Vector  sensor  imaging 

The  developments  of  this  paper  are  motivated  by  the 
electromagnetic  vector  sensor  imaging  problem:  estimating 
the  magnitude,  polarization,  and  direction  of  plane  wave 
sources  from  a  sample  covariance  matrix  of  vector  mea¬ 
surements.  A  vector  sensor  (example  shown  in  Figure  1) 
measures  the  electromagnetic  field  at  a  single  point  using 
three  orthogonal  dipole  elements  and  three  orthogonal  loop 
elements  with  a  common  phase  center,  producing  a  six- 
element  measurement  vector  [2].  This  vector  represents  the 
full  state  of  the  electromagnetic  field  at  the  sensor’s  location 
[3],  implying  sensitivity  from  all  directions  and  complete 
polarization  information.  These  properties  make  even  a 
single  vector  sensor  ideal  for  determining  the  direction 
of  arrival  of  signal  sources  [2,  4,  5],  nulling  or  isolating 
specific  sources,  and  maximizing  the  statistics  collected  at 
a  single  location  [3].  In  this  application,  we  seek  to  use  a 
single  vector  sensor  to  make  a  map  of  signal  strength  and 
polarization  as  a  function  of  direction  of  arrival. 

We  model  the  measured  signals  in  terms  of  a  collection 
of  point  sources  distributed  equally  in  angle  on  the  sur- 
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Figure  1.  Atom  antenna  [1],  an  electromagnetic  vector  sensor.  The 
antenna  is  composed  of  three  orthogonal  loop  and  dipole  elements 
with  a  common  phase  center,  measuring  the  complete  electromagnetic 
field  in  a  six-element  vector. 

rounding  sphere  as  in  [3] .  The  magnitude  and  phase  of  each 
source  at  time  sample  n  of  N  is  collected  in  the  complex 
vector  cn.  The  six-element  measurement  vector  rn  is  then 
given  by 

rn=Acn+wn>  (!) 

where  A  is  the  matrix  of  steering  vectors  describing 

the  contribution  from  each  individual  source  and  wn  is 
measurement  noise.  We  assume  that  the  sources  have  an 
arbitrary  phase  with  respect  to  time  over  the  N  samples,  so 
we  impose  a  probabilistic  model  where  the  source  and  noise 
vectors  are  zero-mean  complex  normal  with  covariances  E 
and  crl: 

cn  ~  0,  E)  Vn 

wn  ~  CV(0,  <rl)  Vn  (2) 

rn  ~  0,  AEA*  +  crl)  Vn, 

where  I  denotes  the  identity  matrix.  Assuming  a  known 
estimate  of  the  noise  variance  a ,  our  objective  is  to  estimate 
the  source  covariance  E  from  the  measurements  rn. 

B.  Covariance  estimation 

One  approach  to  estimating  the  source  covariance  is 
to  seek  the  maximum  likelihood  solution  for  E,  with  the 
likelihood  given  by  the  complex  normal  distribution  for  rn. 


Equivalently,  we  can  minimize  the  negative  log-likelihood, 
leading  to  the  objective  function 

Hml(T,)  =  logdet(AEA*  +  CTl)-Mr((AEA*  Tal)"1^.  (3) 

The  measurements  are  present  only  in  the  sample  covari¬ 
ance  matrix  S ,  making  it  a  sufficient  statistic: 

JV-l 

S  =  X  Y1  Vn  ■  (4) 

n= 0 

Another  approach,  perhaps  more  convenient  because  of  its 
convexity  and  relative  simplicity,  is  to  minimize  the  squared 
error  between  the  estimated  measurement  covariance  and 
the  sample  covariance.  This  gives  the  least-squares  objective 

Hls^)  =  l\\S~(AEA*+al)f2.  (5) 

For  both  cases,  we  seek  to  solve  an  optimization  problem 
of  the  form 

minimize  Ff(E) 

u  E  ^  n  (pml.  Pis) 

subject  to  E  P  0, 

which  minimizes  the  given  objective  while  enforcing  the 
constraint  that  the  covariance  be  positive  semidefinite 
(PSD).  The  choice  of  objective  function  can  depend  on 
a  number  of  factors:  convergence  guarantees,  algorithm 
performance,  and  correctness.  Maximum  likelihood  solves 
the  desired  problem  from  a  statistical  standpoint,  but  it 
may  be  that  least-squares  also  produces  an  acceptable 
solution  for  a  given  case  while  being  more  practical  to 
implement. 


II.  Source  signal  polarization 


A.  Independent  polarized  source  model 
It  is  possible  to  solve  the  maximum  likelihood  and  least- 
squares  covariance  estimation  problems  for  a  general  S, 
but  a  significant  simplification  arises  by  assuming  that  the 
sources  are  statistically  independent  for  each  direction  in 
the  sky.  This  does  not,  however,  imply  that  the  source 
covariance  matrix  E  is  diagonal.  In  order  to  allow  for 
sources  with  arbitrary  polarization,  each  source  must  be 
represented  in  a  basis  of  two  orthogonal  polarizations.  The 
Jones  vector  z  fully  quantifies  the  state  of  a  two-dimensional 
transverse  wave  at  a  given  location  and  time: 


Ehez<l>h 

zh 

_Evei+*_ 

V_ 

Each  component  za  of  the  Jones  vector  gives  the  wave’s 
amplitude  and  phase  as  a  complex  number  for  the  corre¬ 
sponding  basis  direction,  in  this  case  horizontal  and  vertical. 
Therefore,  each  independent  arbitrarily-polarized  source 
requires  two  entries  in  cn  for  its  Jones  vector  components. 

Representing  the  horizontal  and  vertical  basis  compo¬ 
nents  of  each  source  with  corresponding  entries  in  vectors 
hn  and  vn,  we  can  partition  the  measurement  equation  (1) 


as 


A#  [-4  /,  *4,.] 


+  wn, 


(7) 


where  Ah  and  Av  give  the  steering  vectors  for  horizontal 
and  vertical  polarizations  at  each  source  location.  This 
leads  to  a  structured  source  covariance, 


(8) 


Note  that  assuming  independent  sources  for  each  pointing 
direction  implies  that  each  of  the  blocks  T,hh,  E hv ,  and 
E,(n;  must  be  diagonal. 


B.  Stokes  parameter  formulation 

It  is  convenient  both  qualitatively  and  computationally 
to  represent  E  in  terms  of  Stokes  parameters: 

y  —  I 

where  I,  Q,  U,  and  V  are  vectors  with  corresponding  entries 
for  each  independent  source.  This  satisfies  the  typical 
definition  of  Stokes  parameters  in  terms  of  second-order 
polarization  moments: 

I  =  diag(E/l?l  +  EtJ  U  =  2fR(diag(E,J) 

Q  =  diag(Eh/l  -  Y,vv)  V  =  — 23(diag(E/ltJ)). 

I  can  be  interpreted  as  total  intensity,  while  Q ,  U,  and  V 
describe  different  modes  of  polarization. 


diag (/  +  Q)  diag(I7  -  iV) 

diag(C7  +  iV)  diag(J  -  Q)  ' 


C.  Projection  onto  the  positive  semidefinite  cone 

Parametrization  in  terms  of  Stokes  parameters  is  conve¬ 
nient  because  it  becomes  easy  to  project  a  covariance  E 
onto  the  positive  semidefinite  cone: 

II  "II2 
argmin  E  — E 

S  "F  (Pproj) 

subject  to  EH. 


With  E  represented  by  Stokes  parameters  /,  Q,  U,  and  V, 
the  Frobenius  norm  in  (Pproj)  becomes 

Ilk  -  -'ll)  +  flic  -  ®ll)  +  III"  -  "II)  +  ill"  -  %  IT 

Similarly,  the  positive  semidefinite  condition  becomes 


IP  0 

E  ^  Oo  ,  „  „  „ 

I2  P  Q2  +  U2  +  V2, 


(12) 


where  P  on  the  right-hand  side  implies  elementwise  com¬ 
parison. 

This  vector  optimization  problem  is  separable  into  scalar 
components,  so  it  suffices  to  solve  the  problem  for  each 
(Ik,  Qk,  Uk,  Vk)  tuple  and  apply  the  resulting  solution 
element-wise  to  the  vector  Stokes  parameters.  First  define 

pk  =  'JQI  +  UZ  +  VI  (13) 

If  Ik  A  Pk,  no  projection  is  needed  and  the  solution  is 

4  =  4  Qk  =  Qk  Uk  =  uk  Vk  =  Vk.  (14) 


Otherwise,  evaluating  the  KKT  conditions  [6]  gives  a  closed- 
form  expression  for  the  unique  minimizer: 

4  =  max (1(4  +  Pfc),  0) 

Qk=I±Qk  Uk=  ^  Uk  Vk  =  ^  Vk.  1,1  ” 

‘k  4c  4c 

Essentially,  non-physical  Stokes  components  with  the  total 
intensity  less  than  the  polarized  intensity  are  scaled  to 
be  fully  polarized.  Since  the  general  solution  to  (Ppl.oj) 
requires  an  eigendecomposition  [7],  it  is  clear  that  projec¬ 
tion  in  terms  of  Stokes  parameters  produces  a  significant 
simplification. 

III.  Algorithms 

A.  Expectation  maximization 

Application  of  the  expectation  maximization  (EM)  algo¬ 
rithm  [8]  to  the  maximum  likelihood  covariance  estimation 
problem  is  well-known  [9,  10],  and  its  formulation  is  given 
in  Algorithm  1.  It  is  an  iterative  algorithm  that  makes  use 

Algorithm  1  EM  for  (Pml)  [9] _ 

given  £°,  measurements  (S',  a) 

repeat 

Efc+i  _  Efc  grad^  (Efe)£fc 
until  stopping  criterion  is  satisfied 


of  the  gradient  of  the  negative  log- likelihood  function, 

gradH  ;  (£)  =  A*R~X  (R  -  S)R~1A,  (16) 

where 

R  =  AHA*  +  crl.  (17) 

A  convenient  property  of  EM  is  that  it  implicitly  enforces 
the  condition  that  £  be  positive  semidefinite. 

B.  Proximal  gradient 

A  similar  algorithm  that  is  less  well-known  with  respect 
to  the  covariance  estimation  problem  is  proximal  gradient 
[11],  given  in  Algorithm  2  for  a  general  setting.  It  solves 

Algorithm  2  Proximal  Gradient  (PG)  [11] 

given  step  size  p,  x° 

repeat 

zk+ 1  xk  —  p  gradff(a;fe) 

xk+1  prox^F(2fe+1) 
until  stopping  criterion  is  satisfied 


split-objective  problems  of  the  form 

minimize  F(x)+H(x),  (Pproxgrad) 

where  F  must  admit  a  proximal  operator, 

proxF(4  =  argmin  (F{x)  +  |||a:  -  vf^j, 

X 

and  H  must  be  smooth  with  a  gradient  given  by  gradF(a;). 


For  both  the  maximum  likelihood  and  least-squares 
covariance  estimation  problems,  we  already  have  a  smooth 
H.  In  the  least-squares  case, 

gradff  (E)  =  A*  (ASA*  +  al  -  5) A.  (18) 

The  positive  semidefiniteness  constraint  can  be  incorpo¬ 
rated  into  F  as  an  indicator  function,  making  the  proximal 
operator  into  Euclidean  projection  onto  the  PSD  cone. 

Though  this  projection  adds  a  wrinkle  in  comparison 
to  the  EM  algorithm,  proximal  gradient  can  incorporate 
improvements  such  as  adaptive  step  size  [12,  13]  and 
acceleration  [14,  15]  that  enable  improved  convergence. 
In  addition,  it  provides  a  straightforward  path  to  include 
additional  constraints  or  regularization  that  can  be  useful 
for  solving  variants  of  the  covariance  estimation  problem. 

C.  EM  as  Riemannian  proximal  gradient 
Given  the  similarities  between  the  EM  and  proximal 
gradient  algorithms,  it  is  not  surprising  that  they  share  a 
connection  when  the  problem  is  viewed  through  the  lens 
of  gradient  descent  on  the  manifold  of  positive  definite 
matrices.  Manifold  optimization  relies  on  the  Riemannian 
gradient,  which  provides  the  direction  in  the  manifold’s 
tangent  plane  along  which  the  function  increases  most  [16, 
17].  On  the  E  y  0  manifold,  the  Riemannian  gradient  of 
the  maximum  likelihood  objective  is  a  modified  Euclidean 
gradient  [18]: 

grad™  ( (E)  =  E  grad^  (E)£.  (19) 

Notice  that  the  Riemannian  gradient  is  precisely  the 
expression  that  appears  in  the  EM  step  of  Algorithm  1. 
From  this  perspective,  EM  is  simply  unit-step  gradient 
descent  on  the  positive  definite  manifold. 

Introduction  of  a  variable  step  size  to  EM  is  possible 
through  an  appeal  to  the  manifold  optimization  concept  of 
a  retraction.  Useful  in  this  case  is  the  projective  retraction, 
which  takes  a  Euclidean  step  along  the  Riemannian  gradi¬ 
ent  followed  by  a  Euclidean  projection  onto  the  manifold 
[17]: 

Efe+1  =  Retproj  (£fe,  grad™  (Efc)) 

=  prox  (Sfe  — /i  grad™  (Efe)).  (20) 

11  ml 

Unfortunately,  projection  onto  the  open  set  of  positive 
definite  matrices  is  not  well-defined;  we  can  approximate 
by  projecting  onto  the  positive  semidefinite  cone  instead: 

Efe+1  =  prox7  (£fe  —  /i  grad™  (Efc)) .  (21) 

This  looks  like  proximal  gradient  (Algorithm  2)  applied  to 
the  maximum  likelihood  problem,  only  with  the  Rieman¬ 
nian  gradient  replacing  the  Euclidean  gradient. 

In  other  words,  by  taking  proximal  gradient  and  using  a 
Riemannian  gradient  step,  one  can  produce  a  new  family  of 
“EM”  algorithms.  Because  we  can  then  apply  convergence 
improvements  developed  for  proximal  gradient,  this  new 
Riemannian  proximal  gradient  algorithm  may  provide 
significant  benefit  over  EM. 
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Figure  2.  Convergence  in  likelihood  for  a  random  problem  instance.  The  accelerated,  adaptive-step  variant  converges  in  the  fewest  iterations 
for  all  problem  formulations:  least-squares  with  proximal  gradient  (LS-PG),  maximum  likelihood  with  EM  or  Riemannian  proximal  gradient 
(ML-EM),  and  maximum  likelihood  with  proximal  gradient  (ML-PG). 
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Figure  3.  Convergence  in  likelihood  over  all  runs  for  the  fastest  variants.  Trends  from  the  single  run  case  hold  over  many  instances.  ML-PG 
readily  converges  to  a  sparse  solution,  so  it  is  the  easy  choice  for  vector  sensor  imaging.  ML-EM  is  relatively  slow  because  it  is  constrained  to 
the  positive  definite  manifold.  LS-PG  is  not  competitive  enough  to  justify  the  relative  simplicity  of  its  implementation. 


IV.  Simulation  convergence  comparisons 

To  compare  algorithm  convergence  rates,  we  simulated 
150  instances  of  vector  sensor  measurements  for  two 
randomly  located  and  polarized  point  sources.  Figure  2 
shows  the  normalized  negative  log-likelihood,  scaled  and 
shifted  to  start  at  one  and  converge  to  zero,  for  a  single 
problem  instance  as  a  function  of  iteration  number.  It 
gives  results  for  proximal  gradient  applied  to  the  maximum 
likelihood  (ML-PG)  and  least  squares  (LS-PG)  problems 
and  EM  or  Riemannian  proximal  gradient  applied  to  the 
maximum  likelihood  problem  (ML-EM).  Each  algorithm 
was  tested  in  variants  with  either  a  fixed  step,  adaptive 


step,  or  adaptive  step  with  acceleration.  As  expected,  the 
accelerated,  adaptive-step  variant  converges  in  the  fewest 
iterations  for  all  problem  formulations.  The  improvements 
are  significant  in  this  instance,  with  fixed  step  ML-PG 
converging  in  over  10,000  iterations  while  the  accelerated 
adaptive-step  variant  converges  in  about  200  iterations. 

To  best  compare  the  problem  formulations  and  algo¬ 
rithms  between  each  other,  Figure  3  shows  the  convergence 
of  all  150  instances  for  the  three  configurations  using  an 
adaptive  step  size  and  acceleration.  Fastest  convergence  in 
general  is  achieved  when  solving  the  maximum  likelihood 
problem.  Between  standard  proximal  gradient  using  the 


Euclidean  gradient  and  our  “EM”  proximal  gradient  using 
the  Riemannian  gradient,  standard  proximal  gradient 
generally  converges  faster.  The  least-squares  formulation, 
even  though  it  does  eventually  converge  to  the  maximum 
likelihood  solution  in  these  cases,  is  not  competitive  enough 
to  justify  the  relative  simplicity  of  its  implementation. 

The  main  performance  difference  between  formulations 
likely  arises  from  how  well  they  handle  the  sparsity  of 
the  true  solution.  Since  only  two  sources  exist  in  the  true 
solution,  the  source  covariance  is  mostly  zero  and  decidedly 
positive  semidefinite.  Since  EM,  with  the  Riemannian 
gradient,  operates  on  the  positive  definite  manifold,  it 
is  at  a  decided  disadvantage  in  trying  to  converge  to  a 
positive  semidefinite  solution.  It  is  for  this  reason  that 
we  find  ML-PG  most  suitable  for  use  with  vector  sensor 
imaging.  However,  the  new  ML-EM  algorithm  may  be  of 
great  benefit  in  other  problem  settings  where  classic  EM 
already  excels  and  a  positive  definite  solution  is  expected. 

V.  Conclusion 

Vector  sensor  imaging  poses  a  challenge  for  covariance 
estimation.  The  EM  algorithm  solves  the  maximum  like¬ 
lihood  problem,  but  its  slow  rate  of  convergence  makes 
EM  less  than  ideal.  The  proximal  gradient  algorithm,  in 
contrast,  is  well  suited  to  the  vector  sensor  maximum 
likelihood  problem  and  can  also  be  applied  to  the  least- 
squares  covariance  estimation  problem.  EM  and  proximal 
gradient  are  similar,  but  EM  employs  the  Riemannian 
gradient  over  the  manifold  of  positive  definite  matrices. 
From  this,  we  formulated  a  new  Riemannian  proximal 
gradient  algorithm  combining  the  Riemannian  gradient  of 
EM  with  the  projection  and  variable  step  size  of  proximal 
gradient. 

The  difficulty  in  applying  any  form  of  proximal  gradient 
is  that  it  employs  projection  onto  the  positive  semidefinite 
cone,  which  can  be  slow  and  pose  numerical  problems  for 
estimating  a  general  covariance  matrix.  We  showed  that  for 
independent  but  arbitrarily  polarized  signal  sources,  Stokes 
parameters  can  be  used  to  represent  the  source  covariance 
matrix  and  simplify  projection  onto  the  PSD  cone. 

We  compared  the  convergence  of  the  proximal  gradient 
algorithms  applied  to  both  the  maximum  likelihood  and 
least-squares  problems  using  simulated  data.  For  the  vector 
sensor  imaging  problem,  we  showed  that  the  Euclidean 
proximal  gradient  maximum  likelihood  algorithm  converged 
in  the  fewest  iterations,  likely  because  the  solution  was 
sparse  and  unreachable  from  exact  Riemannian  gradient 
steps.  Nevertheless,  we  also  showed  that  Riemannian 
proximal  gradient  improves  upon  EM,  so  it  may  be  useful 
in  other  applications  for  which  EM  is  better-suited. 
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